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ABSTRACT 

Subjects  were  shown  scatterplots  and  were  asked  to  judge  the  amount  of  association  between 
the  two  variables.  Judged  association  increased  when  the  scales  on  the  horizontal  and  vertical 
axes  were  simultaneously  increased  so  that  the  size  of  the  point  cloud  within  the  frame  of  the 
plot  decreased.  Judged  association  was  very  different  from  the  correlation  coefficient,  r,  which 
is  the  most  widely  used  measure  of  association. 
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•  1 

Graphs  are  mainstays  of  the  analysis  and  presentation  of  scientific  data.  One  reason  for  this 
is  that  numerical  summaries  cannot  always  portray  data  unambiguously.  For  example,  the  most 

common  measure  of  the  association,  or  relationship,  between  two  variables  faov)*  *  “  1 . ", 

is  the  absolute  value  of  the  correlation  coefficient,  r,  which  measures  the  amount  of  Umar 
association  between  two  variables  (1).  When  there  is  no  linear  association,  |r|  is  0;  when  there 
is  perfect  linear  association  so  that  x,  and  yt  lie  along  a  straight  Unt,  Irl  is  1.  However, 
different  configurations  of  points  can  yield  the  same  value  of  r,  relationships  can  be  nonlinear, 
and  a  single  value  of  (Xij,)  can  radically  alter  r  (2,3).  A  scatterpiot  can  depict  the  relationship 
between  xj  and  yt  more  reliably  than  any  single  numerical  measure.  For  this  reason  the 
scatterpiot  is  a  very  commonly  used  tool  for  the  investigation  and  presentation  of  the 
relationship  between  two  variables.  But  the  use  of  a  graph  opens  the  door  for  perceptual  factors 
to  enter  into  the  analysis  and  interpretation  of  the  data.  While  a  set  of  date  has  only  one 
numerical  value  for  a  particular  measure  of  association  such  as  r,  the  Judged  association  could 
change  according  to  any  one  of  a  number  of  “display  factors”  such  as  the  size  of  the  plotting 
character,  the  overall  size  of  the  display,  the  orientation  of  the  point  doud  within  the  frame, 
and  the  size  of  the  point  doud  within  the  frame.  The  last  two  factors  are  controlled  by  the 
scales  of  the  vertical  and  horizontal  axes  in  graphs  with  a  fixed  size  plotting  area. 

To  investigate  how  people  judge  association  from  scatterplots  and  how  display  factors  affect 
their  judgments,  we  ran  three  experiments.  In  the  first  experiment  74  subjects  viewed  19 
scatterplots,  all  with  0  or  positive  correlation  coefficients.  The  subjects  were  asked  to  judge 
Umar  association  on  a  scale  from  0  to  100;  0  meant  no  linear  association  (r™0)  and  100  meant 
perfect  linear  association  (r—1).  All  subjects  had  some  statistical  training  and  the  concept  of 
linear  association  was  meaningful  to  them.  The  scatterplots  in  Figure  1  are  reductions  of  two  of 
the  stimuli  from  this  experiment;  the  reader  is  invited  to  judge  the  association  on  these  plots  in 
order  to  understand  the  nature  of  the  judgment  task. 

We  varied  two  factors:  amount  of  association  and  point-cloud  size.  The  size  of  the  frame 
was  kept  fixed.  There  were  10  levds  of  association;  each  scatterpiot  had  a  value  of 
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w(r)  —  1  —  Vl—r2  equal  to  one  of  the  values  0,  .05,  .1,  .2, .8.  w(r)  it  another  numerical 
measure  of  linear  association  that  goes  from  0  to  1  as  r  goes  from  0  to  1;  an  interpretation  of 
w(r)  in  terms  of  the  geometry  of  the  point  cloud  will  be  given  later.  We  used  w(r)  and  not  r 
since  w(r)  seemed,  a  priori,  closer  to  people's  subjective  scales  than  r.  There  were  four 
point-cloud  sizes;  they  are  labeled  1  to  4  where  size  1  is  the  smallest  and  size  4  is  the  largest. 
For  point  cloud  size  3  there  were  10  scatterplots  with  the  10  different  values  of  w(r),  and  for 
each  of  the  other  point  cloud  sizes  there  were  3  scatterplots  with  values  of  w(r)  equal  to  .1,  .4, 
and  .7;  thus  altogether  there  were  19  scatterplots.  For  both  the  two  panels  in  Figure  1, 
w(r)  —  .4  and  r  “  .8;  the  left  panel  is  point-cloud  size  2  and  the  right  panel  is  point-cloud  size 
4. 

Each  (catterplot  had  200  points  and  a  square  frame  with  sides  equal  to  17.3  cm.  In  all  cases 
the  center  of  gravity  of  the  point  cloud  was  at  the  center  of  the  frame.  The  values  portrayed  on 
the  horizontal  axis  of  the  Jt-th  scatterplot,  X/(k),  for  i  —  1,...,200,  and  the  values  portrayed  on 
the  vertical  axis,  yt(k),  for  i  •  1.....200,  formed  a  bivariate  super-normal  point  cloud  (4)  which 
insured  highly  regular  behavior  a  linear  relationship,  no  peculiar  points,  and  an  elliptical 
appearance.  The  major  axis  of  each  point  cloud  was  the  line  y  ■**  and  the  minor  axis  was  the 
Bney 

The  minimum  value  portrayed  on  the  two  axes  of  all  plots  was  0  data  units  and  the 
maximum  value  was  S.6,  7, 10,  or  14  data  units.  Snce  the  length  of  each  axis  was  17.3  cm,  the 
four  scale  values  were  .32,  .40,  .58,  and  .81  data  units/cm.  The  effect  of  decreasing  the  scale 
was  to  increase  the  size  of  the  point  cloud  within  the  frame. 

There  were  4  orders  of  presentation  of  the  19  scatterplots  with  approximately  1/4  of  the 
subjects  judging  each  order.  Two  of  the  orders  were  random  and  the  other  two  were  the 
reverses  of  these. 

Subjects  judged  the  scatterplots  in  stapled  booklets  with  8-l/2*Xll*  pages.  First  there  were 
mitten  instructions  and  sample  scatterplots,  then  four  trial  plots  that  subjects  judged,  and 
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finally  the  19  experimental  plots,  each  on  a  separate  page.  Subjects  were  asked  to  give  their 
own  subjective  assessment  of  the  amount  of  linear  association,  rather  than  to  judge  the 
correlation  coefficient,  and  were  asked  not  to  look  back  or  change  old  answers.  It  was 
suggested  that  they  work  reasonably  quickly  and  that  most  people  could  comfortably  make  a 
single  judgment  within  IS  seconds.  Subjects,  all  of  whom  had  a  basic  knowledge  of  statistics, 
fell  into  three  categories:  students  taking  university  courses  in  statistics,  university  faculty 
members  in  statistics  and  mathematics,  and  statisticians  practicing  statistics  in  government  and 
industry. 

Our  data  analyses  made  extensive  use  of  10%  trimmed  means  (5),  which  are  defined  in  the 
following  way:  Order  the  observations  from  smallest  to  largest;  drop  the  largest  10%  of  the 
observations  and  the  smallest  10%;  take  the  arithmetic  average  of  the  remaining  values.  10% 
trimmed  means  are  robust  estimates  (6,7)  since  they  are  not  distorted  by  a  small  fraction  of 
outliers,  and  they  are  a  compromise  between  arithmetic  means,  which  are  0%  trimmed  means, 
and  medians,  which  are  trimmed  means  dose  to  the  50%  level.  The  standard  errors  of  10% 
trimmed  means  can  be  computed  from  a  formula  given  in  (8). 

Judged  association  for  each  of  the  19  scatterplots  was  summarized  by  10%  trimmed  means 
of  the  subjects’  guesses,  which  were  on  a  scale  of  0  to  100,  divided  by  100.  These  values  are 
plotted  in  Figure  2  against  the  actual  values  of  r  for  the  19  scatterplots;  also  portrayed  are  the 
standard  errors  of  the  trimmed  means.  The  two  curves  are  w(r)  and  g(r)  —  (l-r)/(l+r). 
g (r)  is  another  measure  of  linear  association  that  goes  from  0  to  1  as  r  goes  from  0  to  1;  an 
interpretation  of  g(r)  in  terms  of  the  geometry  of  the  point  doud  will  be  given  later. 

Figure  2  shows  that  judged  association  is  quite  different  from  the  standard  numerical 
measure,  r,  since  the  10%  trimmed  means  lie  well  below  the  line  y  -  x.  This  result  has  been 
found  in  two  other  experiments  (9,10)  in  which  subjects  were  asked  to  guess  the  correlation 
coefficient  from  scatterplots  and  in  experiments  in  which  the  amount  of  association  was  judged 
on  the  basis  of  other  kinds  of  stimuli  (11).  Interestingly,  these  results  also  correspond  to  a 
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Figure  2.  The  10%  trimmed  meant  across  subjects  of  judged  association  divided  by  100  for  19 
scatterpiots  are  plotted  (by  the  circle  centers)  against  the  values  of  r,  the  correlation  coefficient, 
of  the  scatterpiots.  The  circle  radii  portray  the  standard  errors  of  the  trimmed  means.  Thus 
the  circle  areas  are  proportional  to  the  estimated  variances  of  the  trimmed  means.  The 
numbers  to  the  left  of  the  circles  indicate  the  point-cloud  sizes.  When  two  numbers  are  shown, 
separated  by  a  comma,  two  circles  are  nearly  coincident  and  the  first  number  refers  to  the  circle 
with  the  smaller  trimmed  mean.  The  upper  solid  curve  on  the  plot  is  g (r)  and  the  lower  solid 
curve  is  w(r).  The  dashed  line  is  the  line  y  “  x.  The  information  on  the  plot  loads  to  two 
conclusions:  judged  association  tends  to  increase  as  the  point-cloud  size  decreases  due  to 
increasing  scale;  judged  association  is  very  different  from  the  standard  numerical  measure  of 
association,  the  correlation  coefficient 


Statement  of  Wilk  (12):  it  it  felt  by  tome  bpplied  statisticians]  that  vaiuet  of  |r|  below  .5 

are  quite  ‘small’,  while  r  it  ‘large’  only  when  |r|  it  above  .8,  and  r  it  ‘really  large*  (dote  to  a 
linear  dependence  of  the  variables)  only  when  |r|  it  above  .95.”  Wilk  argues  that  w(r)  it  a 
more  tentible  numerical  meature  of  aatodation  than  r;  Figure  2  thowt  that  w(r)  does  come 
closer  to  detcribing  the  perceived  attociation  for  our  tubjectt  than  doet  r. 

Figure  2  alto  thowt  that  the  tendency  it  for  judged  attociation  to  increase  as  the  point  cloud 
size  decreases  due  to  the  increase  in  the  scales;  the  effect  it  most  pronounced  when  w(r)  —  .4. 
In  all  cates  the  perceived  associations  for  sizes  1  and  2  are  greater  than  those  for  sizes  3  and  4. 
The  effect,  however,  does  not  appear  to  extend  beyond  size  2;  for  aU  three  values  of  w(r),  the 
trimmed  mean  for  point-cloud  size  2  it  either  very  dote  to  that  of  size  1  or  somewhat  greater. 
And  sizes  3  and  4  differ  from  one  another  by  a  nontrivial  amount  only  for  w(r)  —  .4(r— .8). 

To  investigate  the  statistical  significance  of  the  effect  of  changing  scale  we  performed  the 
foUowing  operations:  For  each  subject  and  each  level  of  w(r)  in  which  scale  was  varied 
(w(r)—.l,.4,.7)  we  subtracted  the  subject’s  estimate  for  the  largest  point-doud  size,  4,  from 
each  of  the  estimates  for  the  other  three  sizes,  which  for  each  subject  yielded  3  differences  for 
each  of  the  3  levds  of  w(r);  then  we  computed  10%  trimmed  means  and  thdr  standard  errors 
across  the  subjects.  Each  trimmed  mean  divided  by  its  standard  error  has,  approximately,  a  »- 
distribution  with  57  degrees  of  freedom  (8);  this  distributional  result  can  be  used  to  test  the 
significance  of  the  difference  of  the  point-doud  size  4  response  from  the  responses  to  the  other 
sizes.  For  w(r)  —  .1  the  size  4  response  is  significantly  different  (at  the  .01  levd)  only  from 
the  level  2  response;  for  w(r)  ■  .4  the  size  4  response  is  significantly  different  from  all  three  of 
the  other  responses;  for  w(r)  ■  .7  the  size  4  response  is  significantly  Afferent  only  from  the 
size  1  response. 

We  ran  a  second  experiment  to  check,  under  different  conditions,  this  effect  of  scale.  109 
subjects  in  3  groups  of  27,  36,  and  46  people  were  shown,  ahernatdy,  the  two  scatterplots  in 
Figure  1  by  an  overhead  transparency  projected  onto  a  screen  in  the  front  of  a  room.  Subjects 


were  asked  to  assess  the  association  of  each  plot  on  a  scale  of  0  to  100.  The  10%  trimmed 
means  of 

dement  for  point-cloud  size  2)  —  (judgment  for  point-cloud  size  4 


100 

across  subjects  is  .0(8  with  a  standard  error  of  .011.  The  10%  trimmed  mean  of  the 
corresponding  values  for  the  sutgects  in  the  first  experiment  is  .123  with  a  standard  error  of 


We  ran  a  third  experiment  to  further  investigate  the  results.  Thirty-two  subjects  in  a  single 
group  were  shown  the  scatterplots  in  Figure  1  in  the  same  manner  as  the  sutgects  in  the  second 
experiment.  But  in  this  case  sutgects  were  told  that  the  correlation  coefficients  of  the  two 
scatterplots  were  the  same  and  were  asked  to  indicate  whether  one  of  the  two  “looked”  more 
highly  correlated  than  the  other  and  if  so,  which  one.  66%  indicated  that  the  size  2  scatterplot 
looked  more  correlated,  13%  indicated  the  size  4  scatterplot,  and  22%  said  they  looked  the 
same.  This  has  the  same  pattern  as  in  the  first  experiment,  where  the  corresponding  percents 
are  81%,  18%,  and  13%,  and  in  the  second  experiment,  where  the  corresponding  percents  are 
39%.  11%,  and  30%. 

Thus  the  second  and  third  experiments  strongly  corroborated  the  conclusion  of  the  first 
experiment:  increasing  the  scales  on  the  horizontal  and  vertical  axes  of  a  scatterplot  so  as  to 
decrease  the  point-cloud  size,  increases  the  judged  association. 


Knowing  what  perceptual  strategies  people  employ  in  judging  association  from  scatterplots 
might  not  only  provide  an  explanation  of  the  effect  of  scale  in  our  three  experiments,  but  might 
also  enable  more  effective  design  of  scatterplots.  The  point  clouds  on  the  scatterplots  in  our 
experiments  have  an  elliptical  look  to  them  because  the  bivariate  normal  distribution,  from 
which  we  can  think  of  the  points  as  arising,  has  a  density  with  elliptical  contours.  Two  of  the 
features  of  the  point  clouds  are  the  ratio  of  the  lengths  of  the  minor  and  major  axes  and  the 
area.  Subjects  might  be  using  either  to  judge  association. 
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The  ratio  of  the  minor  exit  to  the  major  exit  of  a  contour  of  the  associated  bivariate  normal 
distribution  is  (l-r)/(l+r),  since  the  standard  deviations  of  x,(k)  and  yi(k)  are  equal  and 
since  the  scales  on  the  horizontal  and  vertical  axes  of  each  scatterplot  are  the  same.  If  subjects 
were  judging  association  by  judging  the  ratio  of  the  axes  of  the  point-cloud,  then  the  judged 
scale  would  be  g(r),  which,  as  described  earlier,  is  shown  in  Figure  2. 

The  area  of  an  elliptical  contour  of  the  associated  bivariate  normal  distribution  divided  by 
the  area  of  a  rectangle  with  sides  parallel  to  the  horizontal  and  vertical  axes  of  the  plot,  is  equal 
to  Vl— A  If  subjects  were  judging  association  by  judging  the  areas  of  the  point  clouds  relative 
to  a  circumscribed  rectangle,  the  judged  scale  would  be  w(r),  which,  as  described  earlier,  is 
shown  in  Figure  2. 

Neither  of  the  curves  w(r)  and  g(r)  appear  to  describe  the  judged  association.  It  could  be, 
however,  that  one  of  the  two  geometrical  tasks  —  judging  axis  ratios  or  judging  areas  —  is 
being  carried  out,  but  that  there  are  biases  in  the  judgments  that  alter  the  perceived  association. 
For  example,  it  is  known  that  judgments  of  area  and  length  tend  to  be  proportional  not  to  the 
physical  quantity,  but  rather  to  the  physical  quantity  to  a  power  less  than  1  (13).  We  have 
begun  a  serier  *  new  experiments  to  attempt  to  better  understand  the  perceptual  mechanism 
that  people  use  in  judging  association. 
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